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ABSTRACT 

When designing genetic circuits, the typical primitives used 
in major existing modelling formalisms are gene interaction 
graphs, where edges between genes denote either an acti¬ 
vation or inhibition relation. However, when designing ex¬ 
periments, it is important to be precise about the low-level 
mechanistic details as to how each such relation is imple¬ 
mented. The rule-based modelling language Kappa allows 
to unambiguously specify mechanistic details such as DNA 
binding sites, dimerisation of transcription factors, or co¬ 
operative interactions. However, such a detailed description 
comes with complexity and computationally costly execu¬ 
tion. We propose a general method for automatically trans¬ 
forming a rule-based program, by eliminating intermediate 
species and adjusting the rate constants accordingly. Our 
method consists of searching for those interaction patterns 
known to be amenable to equilibrium approximations (e.g. 
Michaelis-Menten scheme). The reduced model is efficiently 
obtained by static inspection over the rule-set, and it rep¬ 
resents a particular theoretical limit of the original model. 
The Bhattacharyya distance is proposed as a metric to esti¬ 
mate the reduction error for a given observable. The tool is 
tested on a detailed rule-based model of a A-phage switch, 
which lists 96 rules and 16 agents. The reduced model has 
11 rules and 5 agents, and provides a dramatic reduction in 
simulation time of several orders of magnitude. 

1. INTRODUCTION 

One of the main goals of synthetic biology is to design and 
control genetic circuits in an analogous way to how elec¬ 
tronic circuits are manipulated in human made computer 
systems. The field has demonstrated success in engineering 
simple genetic circuits that are encoded in DNA and perform 
their function in the cellular environment [20], [24]. How¬ 
ever, there remains a need for rigorous quantitative charac¬ 
terisation of such small circuits and their mutual compati¬ 
bility [33], which in electronic circuits is easily guaranteed 
by impedance matching. The important ingredient towards 
such characterisation is having an appropriate language for 


capturing model requirements, for prototyping the circuits, 
and for predicting their quantitative behaviour before com¬ 
mitting to the time-intensive experimental implementation. 


Quantitative modelling of biomolecular systems is particu¬ 
larly challenging, because one deals with stochastic, highly 
dimensional, non-linear dynamical systems. For these rea¬ 
sons, modellers often immediately apply ad-hoc simplifica¬ 
tions which neglect the mechanistic details, but allow to pre¬ 
dict (simulate) the system’s behaviour as a function of time. 
For example, the fact that protein A activates protein P is 
often modelled immediately in terms of a reaction A —> A+P 
with the Hill kinetic coefficient (e.g. i+^[A ] n )? while the 

mechanism in fact includes the formation of a macromolec- 
ular complex and its binding to a molecular target. While 
such models are easier to execute, the simplification makes 
models hard to edit or refine. For example - a new exper¬ 
imental insight about an interaction mechanism cannot be 
easily integrated properly into the model, since several mech¬ 
anistic steps are merged into a single kinetic rate. Moreover, 
an abstract model does not provide precise enough design 
guide for circuit synthesis, and sometimes, only the more 
detailed models explain certain behaviours (e.g., in [l5], it is 
shown that only when incorporating the mRNA, the model 
explains certain experimentally observed facts). 


Rule-based languages, such as Kappa [l8] or BioNetGen [ 4 ], 
are designed to naturally capture the protein-centric and 
concurrent nature of biochemical signalling: the internal 
protein structure is maintained in form of a site-graph, and 
interactions can take place upon testing only patterns , local 
contexts of molecular species. A site-graph is a graph where 
each node contains different types of sites, and edges can 
emerge from these sites. Nodes typically encode proteins 
and their sites are the protein binding-domains or modi¬ 
fiable residues; the edges indicate bonds between proteins. 
Then, every species is a connected site-graph, and a reaction 
mixture is a multi-set of connected site-graphs. The execu¬ 
tions of rule-based models are traces of a continuous-time 
Markov chain (CTMC), defined according to the principles 
of chemical kinetics. In general, rule-based models are ad¬ 
vantageous to the classical reaction models (Petri nets) for 
two major reasons. First, the explicit graphical represen¬ 
tation of molecular complexes makes models easy to read, 
write, edit or compose (by simply merging two collections 
of rules). For example, the reaction of dimerization between 
two lambda Cl molecules is classically written 2 Cl Cl 2, 
where the convention is that Cl represents a free monomer, 



and CI 2 represents a free dimer. On the other hand, the 
same reaction written in Kappa amounts to: 

’CI2:’ Cl(ci,or), Cl(ci,or) x-x Cl(ci!l,or), Cl(ci!l,or)@k 2 +, k 2 -, 

where the binding sites ci and or are binding sites of the 
protein CI, and Cl(ci!l,or) denotes that the identifier of the 
rule-based bond account for the physical interaction between 
the two CI monomers, is 1. Secondly, a rule set can be exe¬ 
cuted, or subjected to formal static analysis: for example, it 
provides efficient simulations [ll], |29 automated answers 
about the reachability of a particular molecular complex 
[l3| , or about causal relations between rule executions 10 . 

The downside of incorporating too many mechanistic details 
in the model, is that they lead to computationally costly 
execution. For this reason, we define and implement an effi¬ 
cient method for automatically detecting and applying equi¬ 
librium approximations. As a result, one obtains a smaller 
model, where some species are eliminated, and the kinetic 
rates are appropriately adjusted. In this way, the experimen¬ 
talist can choose to obtain the predictions more efficiently 
but less accurately, however without losing track of the un¬ 
derlying low-level mechanisms. 

To the best of our knowledge, there exist no efficient meth¬ 
ods to quantify the error induced by time-scale separation 
approximations for biochemical reaction networks. The bot¬ 
tleneck is the complexity of the original system, whose be¬ 
haviour is computationally costly to analyse - often even 
to run a single simulation trace. The correctness of our ap¬ 
proach relies on the fact that the approximate model is equal 
to the original one, in the artificial limit where certain reac¬ 
tions happen at a sufficiently larger time-scale than others, 
and they are seemingly equilibrated shortly upon the reac¬ 
tions initiate. Faced with designing or modelling biological 
circuits with many connections and highly heterogeneous 
hardware, the ability of predicting solutions that lack a pre¬ 
cise error measurement is of secondary importance. What 
is desirable is to have a prototype or model of a circuit that 
displays the desired behaviour at a qualitative level, which 
later on can be further improved. Furthermore, most ki¬ 
netic rates are rarely precisely determined experimentally, 
and hence precise quantitative error estimates are not nec¬ 
essarily relevant and on top are time consuming, when faced 
with imprecise input characteristics of the underlying pro¬ 
totype model. 

Implementation and testing. The tool is implemented 
in OCaml, and it is tested on a detailed rule-based model of 
a A-phage switch [37], [38]. Simulations were carried out on 
the complete chemical reaction genetic circuit model which 
contains 96 rules, 16 agents and 61 species. The model is 
reduced to only 11 rules and 5 agents. It is worthwhile em¬ 
phasising that our reduction method is general - applicable 
to any rule-set, and that the reduced model is obtained al¬ 
most instantaneously. 

Related work. Our method is inspired from the work pre¬ 
sented in [36 and [32], with the adaptations which arise 
due to the differences between the reaction lists and the 
rule-based language. Apart from rule-based models, other 
formalisms were proposed for characterisation of synthetic 
devices, such as, e.g., linear temporal logic |2 . In a broader 
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Figure 1: An example of a rule-based model. The 
transcription factor T binds to the operator and ini¬ 
tiates the production of protein P. 


context, the principle of obtaining conclusions about sys¬ 
tem’s dynamics by analyzing their model description, origi¬ 
nates from, and is exhaustively studied in the field of formal 
program verification and model checking T], 6 , while it 
is recently gaining recognition in the context of programs 
used for modeling biochemical networks. An example is the 
related work of detecting fragments for reducing the deter¬ 
ministic or stochastic rule-based models [l6], [l9|, [l7 , de¬ 
tecting the information flow for ODE models of biochemical 
signaling |26[ [5], or the reaction network theory [8]. Related 
works on systematic model reduction techniques are based 
on separating time-scales [28 39, 30, 23, 29 , or they propose 
numerical algorithms which focus of efficiently obtaining the 
evolution of the probability distribution over time (the mas¬ 
ter equation) [34] [27] . 


Paper outline. Section [5] introduces two concepts: (i) the 
classical stochastic and deterministic model of chemical re¬ 
action networks, and (ii) the rule-based modelling language 
Kappa. Section [3] illustrates the equilibrium approximation 
schemes (the generalized Michaelis-Menten and fast dimeri- 
sation) and discusses the theoretical guarantees about the 
approximate system. In Section [4] we outline the algorithms 
for detecting the approximation schemes (operator site re¬ 
duction and dimerisation reduction). Finally, in Section [5] 
we describe the A-phage model and we compare the results 
and the CPU time for the original and approximate model. 


2. PRELIMINARIES 

2.1 Stochastic chemical reaction networks 

For a well-mixed reaction system with molecular species S = 
{S 1 ,..., S n }, the state of a system can be represented as a 
multi set of those species, denoted by x = (aq,..., x n ) £ N n . 
The dynamics of such as system is determined by a set of 
reactions R =s {ri,...,ry}. Each reaction is a triple Xj = 
(a^, Vj , Cj ) G N n x N n x M>o, written down in the following 
form: 

j Si , . . . , CL n j Sn 4 j Si , . . . , CL n j Sn 5 
such that a'ij = dij + Vij . 

The vectors a j and a'- are often called respectively the con¬ 
sumption and production vectors due to reaction Xj , and kj is 
the kinetic rate of reaction r j. If the reaction r j occurs, after 
being in state x, the next state will be x.' = x + i/j. This will 
be possible only if xi > aji for allz = 1,..., n. Under certain 
physical assumptions [2R, the species multiplicities follow 
a continuous-time Markov chain (CTMC) (A(t)}, defined 










over the state space S = {x | x is reachable from xo in R}. 
Hence, the probability of moving to the state x + Vj from x 
after time A is 

P (X(t + A) = x + vk | X(t) = x) = Aj(x)A + o(A), 

with A j the propensity of jth reaction, assumed to follow 
the principle of mass-action: Aj(x) = W=i(£)- The 

binomial coefficient ( Xi ) reflects the probability of choosing 

\CLij / 

dij molecules of species Si out of x\ available ones. 

2.2 Deterministic limit 

In the continuous, deterministic model of a chemical reaction 
network, the state z (t) = (z 1, . • •, z n )(t) G M n is represented 
by listing the concentrations of each species. The dynamics 
is given by a set of differential equations in form 

= Vij Cj ft Zi(t ) aii , (1) 

3 =1 i=1 

where Cj is a deterministic rate constant, computed from 
the stochastic one and the volume N from Cj — kjN 
(|x| denotes the 1-norm of the vector x). The deterministic 
model is a limit of the stochastic model when all species in a 
reaction network are highly abundant. Denote by Rj(t) the 
number of times that the j-th reaction had happened until 
the time t. Then, the state of the stochastic model at time 
t is 

r 

X(t) = X(0) + '£'R j (t)» j . (2) 

3 = 1 

The value of Rj(t) is a random variable, that can be de¬ 
scribed by a non-homogenous Poisson process, with parame- 
ter fo V'(X(s))ds, that is, Rj(t) = £,•(/„* Aj(X(s))ds). Then, 
the evolution of the state X(£) is given by the expression 

X(t) = X(0) + gdf A,(X( s ))d s ) Vj . (3) 

By scaling the species multiplicities with the volume: Z* ( t ) = 
Xi(t)/N, adjusting the propensities accordingly, in the limit 
of infinite volume N oo, the scaled process Z (t) follows 
an ordinary differential equation 0 M- 

It is worth mentioning here that the above scaling from 
stochastic to the deterministic model i s a special case of 
a more general framework presented in 30], referred to as 
the multiscale stochastic reaction networks. Intuitively, the 
deterministic model is a special case where all species are 
scaled to concentrations and reaction rates are scaled always 
in the same way, depending on their arity. The reductions 
shown in this paper will be based on a variant of multiscale 
framework, where some species are scaled to concentrations 
and others are kept in copy numbers, and where reaction 
rates have varying scales as well. 

2.3 Rule-based Models 

In this section, we introduce the rule-based modeling lan¬ 
guage Kappa, which is used to specify chemical reaction 
networks, by explicitly denoting chemical species in form 
of site-graphs. A simple example of a Kappa model is pre¬ 
sented in Fig. [l] 


For the stochastic semantics of Kappa, that is a continuous¬ 
time Markov chain (CTMC) assigned to a rule-based model, 
we refer to [l2] or [l7 . Intuitively, any rule-based system can 
be expanded to an equivalent reaction system (with poten¬ 
tially infinitely many species and reactions). The stochastic 
semantics of a Kappa system is then the CTMC (X(t)} as¬ 
signed to that equivalent reaction system. Even though the 
semantics of a Kappa system is defined as the semantics 
of the equivalent reaction system, in practice, using Kappa 
models can be advantageous for several reasons - they are 
easy to read, write, edit or compose, they can compactly 
represent potentially infinite set of reactions or species, and, 
most importantly, they can be symbolically executed. 

We present Kappa in a process-like notation. We start with 
an operational semantics. 

Given a set A, p(X) denotes the power set of X (i.e. the set 
of all subsets of A). We assume a finite set of agent names 
A, representing different kinds of proteins; a finite set of sites 
<S, corresponding to protein domains; a finite set of internal 
states I, and E t , E^ two signature maps from A to p(<S), 
listing the domains of a protein which can bear respectively 
an internal state and a binding state. We denote by E the 
signature map that associates to each agent name A G A 
the combined interface E t (A) U E 13 (A). 

Definition 1. (Kappa agent) A Kappa agent A(a) is de¬ 
fined by its type A G A and its interface a. In A(cr), 
the interface a is a sequence of sites s in E(A), with in¬ 
ternal states (as subscript) and binding states (as super¬ 
script). The internal state of the site s may be written as 
s e , which means that either it does not have internal states 
(when s G E(A) \ E t (A)), or it is not specified. A site that 
bears an internal state m G I is written s m (in such a case 
s G E t (A)). The binding state of a site s can be specified as 
s e , if it is free , otherwise it is bound (which is possible only 
when s G E /3 (A)). There are several levels of information 
about the binding partner: we use a binding label i G N 
when we know the binding partner, or a wildcard bond — 
when we only know that the site is bound. The detailed 
description of the syntax of a Kappa agent is given by the 
following grammar: 


a ::= N(a) 

(agent) 

N ::= A G A 

(agent name) 

a ::= e | s,cr 

(interface) 

s ::= n * 

(site) 

n ::= x G S 

(site name) 

i e | ra G I 

(internal state) 

A ::= e — | i G N 

(binding state) 


We generally omit the symbol e. 

Definition 2. (Kappa expression) Kappa expression E is 
a set of agents A(a) and fictitious agents 0. Thus the syntax 
of a Kappa expression is defined as follows: 

E ::=e I a , E \ 0 , E. 

The structural equivalence =, defined as the smallest binary 
equivalence relation between expressions that satisfies the 




rules given as follows 

E , A(v,8,8',<t') ,E' = E, A(a,s' ,s,a') , E' 

E , a , a' , E' = E , a' , a , E' 

E = E , 0 

i,j 6N and z does not occur in E 

WTil = E 

z G N and z occurs only once in E 
E\e/i\ = E 

stipulates that neither the order of sites in interfaces nor the 
order of agents in expressions matters, that a fictitious agent 
might as well not be there, that binding labels can be injec¬ 
tively renamed and that dangling bonds can be removed. 

Definition 3. (Kappa pattern,mixture and species) 

A Kappa pattern is a Kappa expression which satisfies the 
following five conditions: (i) no site name occurs more than 
once in a given interface; (ii) each site name s in the interface 
of the agent A occurs in E(A); (iii) each site s which occurs 
in the interface of the agent A with a non empty internal 
state occurs in E t (A); (iv) each site s which occurs in the 
interface of the agent A with a non empty binding state 
occurs in E*(A); and (v) each binding label z G N occurs 
exactly twice if it does at all — there are no dangling bonds. 
A mixture is a pattern that is fully specified, i.e. each agent 
A documents its full interface E (A), a site can only be free 
or tagged with a binding label z G N, a site in E t (A) bears an 
internal state in I, and no fictitious agent occurs. A species 
is a connected mixture, i.e. for each two agents Ao and A 
there is a finite sequence of agents Ai,..., Ak s.t. there is a 
bond between a site of Ak and of A and for z = 0,1,..., k — 1, 
there is a site of agent A% and a site of agent Ai+i. 

Definition 4- (species occurring in a pattern) Given Kappa 
patterns E s and E p , if E s defines a Kappa species, and E s 
is a substring of E p , we say that a species E s occurs in a 
pattern E p . 

Definition 5. (Kappa rule) A Kappa rule r is defined by 
two Kappa patterns Ei and E r , and a rate k G M>o, and is 
written: r — Eg E r @k. 

A rule r is well-defined, if the expression E r is obtained 
from Ei by finite application of the following operations: 
(i) creation (some fictitious agents 0 are replaced with some 
fully defined agents of the form A(<r), moreover a documents 
all the sites occurring in E(A) and all site in E t (A) bears 
an internal state in I), (ii) unbinding (some occurrences of 
the wild card and binding labels are removed), (iii) deletion 
(some agents with only free sites are replaced with fictitious 
agent 0), (iv) modification (some non-empty internal states 
are replaced with some non-empty internal states), (v) bind¬ 
ing (some free sites are bound pair-wise by using binding 
labels in N). 

In our static inspection of rules, we will test species (fully 
defined connected mixtures). To this end, we adopt the 
terminology of reactant , modifier and product from |32| . 

Definition 6. (reactant, modifier, product) Given a rule 
( Ei , E r ), a Kappa species s is called 



c ) 


Figure 2: Example shown in Fig. [lj The mean 

protein expression for one hundred sampled traces, 
before and after the enzymatic catalysis reduction, 
a) Parameters k\ — 0.2156, fe = 1, k% — 0.014 and there 
are initially 50 transcription factors. The mean and 
standard deviation (not shown) are computed for 
each time point, for the original (full line) and re¬ 
duced model (dotted line), b) Parameters & 2 , & 3 , 
and the initial number of transcription factors T are 
scaled up by factor N = 10. Same notation as for a) 
c) The Bhattacharyya distance between the distri¬ 
butions of the protein level with a model before and 
after the reduction. Red plot refers to the param¬ 
eter values shown in a), and the green plot to the 
scaled parameter values shown in b). 

• a reactant , if it occurs in pattern Ei and does not occur 
in pattern E r , 

• a modifier , if the number of occurrences in pattern Ei 
equals the number of occurrences in pattern E r , 

• a product , if it does not occur in pattern Ei , and it 
occurs in pattern E r . 

Definition 7. (Kappa system) A Kappa system P(xo,(D, 
{r i,..., r n }) is given by an initial mixture xo, a set of Kappa 
patterns O called observables , and a finite set of rules {n,..., 

3. MODEL APPROXIMATION 

In this section, we present the mathematical analysis under¬ 
lying the approximation algorithms presented in Section [4| 
Our reductions will be based on three reduction schemes: 
enzymatic catalysis reduction, generalized enzymatic catal¬ 
ysis reduction and fast dimerization reduction. 

3.1 Enzymatic reduction 

Assume the elementary enzymatic transformation from a 
substrate S' to a product P, through the intermediate com¬ 
plex E : S: 

E + S^E:SHE + P, (4) 

k 2 









which our algorithm will convert to the well-known Michaelis- 
Menten form 

,/eg Fjrp H 

S 1+ ^ xs P, (5) 

where Et = XE(t) + XE-.s(t) denotes the total concentration 
of the enzyme, and K = 1 * 2 +k 3 • 

The above approximation is generally considered to be suf¬ 
ficiently good under different assumptions, such as, for ex¬ 
ample, that the rate of dissociation of the complex to the 
substrate is much faster than its dissociation to the product 
(i.e. &2 fe), also known as the equilibrium approxima¬ 
tion. Even if the equilibrium condition is not satisfied, it 
can be compensated in a situation where the total number 
of substrates significantly outnumbers the enzyme concen¬ 
tration - xs(0) + K Et, known as the quasi-steady-state 
assumption. 


Whenever one of the above assumptions holds, the quantity 
of the intermediate complex can be assumed to be rapidly 
reaching equilibrium, that is, ^XE-.T(t) = 0. Then, it is 
straightforward to derive the rate of direct conversion from 
substrate to product: 


d 


d t xp 


k^EpK 

-1 . r? - X S, 

1 + Kx s 


which exactly corresponds to the equation for the rule ft 


(as a consequence of being related to the bimolecular, and 
not unimolecular reaction). 

A complete proof is provided in [30]. We here outline the 
general idea. Let N > 0 be a natural number, and let 
Z S (t) = X S (t)/N , Z E (t) = X E (t), Z S :E (t) = X 3 :E(t), 
Z P (t) = Xp(t)/N. Writing out the scaled random time- 
change model for the substrate gives: 


Z s (t) = Z s (0) - N - 1 


+ N - 1 


£i (N f 

JO 

UN f 

Jo 


j 1 Z s (s)Z E (s)ds) 


72 ^S:s(s)ds), 


and writing out the scaled random time-change model for 
the complex gives: 


Ze-.s (t') — ZE:S ( 0 ) + £i(N f ^fiZ s (s)Z E (s)ds) 
Jo 

~&(N [ ^2Z S :E(s)ds) 

Jo 

— £3 (N f ^Zs-.E{s)ds). 

Jo 


The informal terminology of being ‘significantly faster’, mo¬ 
tivated the rigorous study of the limitations of the approx¬ 
imations based on separating time scales. While the enzy¬ 
matic (Michaelis-Menten) approximation has been first in¬ 
troduced and subsequ ent ly studied in the context of deter¬ 
ministic models (e.g. 35], Ch.6), it was more recently that 
the time-scale separation was investigated in the stochastic 
context [39], [25 , [9], [28], [40], [22]. Notably, the following 
result from 14| (also shown as a special case of the multi 
scale stochastic analysis from 30]), shows that, under an ap¬ 
propriate scaling of species’ abundance and reaction rates, 
the original model and the approximate model converge to 
the same process. 


Theorem 1. (Darden \lf\ , Kang \30\). Consider the re¬ 
action network 0 (equivalently the rule-based system de¬ 
picted in Fig. and denote by Xs(t), Xsif), XE-.sif ) and 
Xp(t ) the copy numbers of the respective species due to the 
random-time change model ©• Denote by Et — XE:s(t ) + 
X E (t) and Ve( t) = fy N~ x Xe(s)ds and assume that N = 
0(Xs). Assume that k\ -+ 71 , fe/iV -+ 72 , k^/N -+ 73 , 
N ->• oo ; and ^(o). Then (^+VE(i)) con- 

verges to (xs(t),VE(t)) and 


d_ 

dt 


ve(s) 


Et 

1 + Kx e (s) 


and 


d E T ^Kxs(t) 

— Xs — -^-, 

dt 1 + Kx s (t) 


After dividing the latter with N , and applying the law of 
large numbers, we obtain the balance equations analogous 
to assuming that the complex is at equilibrium. This equa¬ 
tion implies the expression for -^ve(s). The equation for 
■Axs follows from the model of Zs(t ): we first use the con¬ 
servation law Zs-.e(s) = N~ x Ep — Zsit) and then substitute 
the obtained value of ^ve(s). 


In order to confirm that the reduction is appropriate, our 
goal is now to show that the scaled versions of the original 
model © and the reduced model © are equivalent in the 
limit when N -+ 00 . Let Zp(t ) := N~ 1 Xp(t) be the scaled 
random time change for the product in the original model, 
and Zp(t) := N~ 1 Xp(t) in the reduced model. Notice that, 
from the balance equations, A. Xp = According to 

the reduced system ©. the random time change for the 
product is given by 

Zp(t) = Zp{ 0) + N-'ti f -^1+— NZs(s)ds) 

Jo l + KNZ s (s) 

= Z P ( 0) + N-'ti f N 73 ^ t A Zs(a)da). 

Jo 1 + KZ s (s) 

Passing to the limit, we obtain the desired relation ^zp{t) — 
f t z P {t). 


where K — 


71 

72+73 ' 


The assumptions listed in the theorem capture the that: (i) 
Xs and Xp are scaled to concentrations, while Xe and Xe-.s 
remain in copy numbers; (ii) the stochastic reaction rate k\ 
is an order of magnitude smaller than the rates k^ and fe 


The above Theorem does not provide the means of com¬ 
puting the approximation error, or an algorithm which sug¬ 
gests which difference in time-scales is good enough for an 
approximation to perform well. Rather, this result shows 
that the enzymatic approximation is justified in the limit 
when the assumptions about the reaction rates and species’ 
abundance are met. In other words, when N —y 00 , the 












scaled versions of the original and reduced models - e.g. 
Zp(t) = N~ 1 Xp(t) and Zp = N~ 1 Xp - both converge to 
at the same, well-behaved process. This provides confidence 
that the actual process Xp is a good approximation of the 
process Xp. 

In our reduction algorithm (Section [4]), we will apply the 
reduction whenever the pattern is detected. In order to 
ensure the validity of the approximation in the context of 
other rules, we will additionally check that the enzyme E 
(resp. complex E : S ) have initially low copy number (zero 
resp.), and that they don’t appear in any other rule (unless 
it is another enzymatic catalysis scheme). 

Example 1. To illustrate the meaning of the Theorem^ 7J 
we apply our reduction method on a small example shown 
in Fig. [7J We plot the mean and we compute the standard 
deviation of the protein level for the original and for the re¬ 
duced model. Then, we scale up the parameters k<i and fe, 
as well as the initial concentration of transcription factor T, 
in order to mimic the effect of choosing a larger N in The¬ 
orem [7J The deviation between the curves is decreased, as 
can be seen in Fig. ^ In order to obtain the error of using 
the reduced system instead of the original one, we compute 
the Bhattacharyya distance for each time point, for the ac¬ 
tual parameter set and for the scaled parameter set. As ex¬ 
pected, the distance is overall smaller in the scaled system. 
Especially in the scaled system (green line), we can observe 
that initially, the distance is larger, and then it decreases 
with time. This is because the original system takes time to 
reach the equilibrium state which is, in the reduced system, 
assumed immediately. 

3.2 Generalised enzymatic reduction 

The enzymatic approximation can be generalized to a sit¬ 
uation where many sets of substrates compete for binding 
to the same enzyme. Consider a sub-network of n reactions 
where the z-th such reaction reads: 

E +Si,! + ...+ Si, mi E E: Si,Si, mi A E + Pi. 

k i 

The resulting approximation is 

k^ Erp K-^x g. 

Sip + • • • + Si,rrii Pi, 

where x Si = x Sitj , Z m l + E, G{ i,...,n } x Sj and 

Et = Xpit) + EILi x E:S iA :...:S ijrn . (t). The latter expression 
follows from ppXE:S iX ....:S i , rn .(t) = 0 for all i = 0 ,... ,n. 

The correctness of the generalized enzymatic reduction can 
be shown with the same technique as Theorem [I] Each 
substrate and product should be scaled to concentrations, 
while all intermediate complexes and the enzyme remain in 
copy numbers. The relations between the reaction rates are 
equivalent. 

3.3 Fast dimerization reduction 

k 

Consider now the dimerisation reaction M+M M 2 . As- 

k~ 

suming that both rates k and k~ are fast comparing to other 
reactions involving M or M 2 , it is common to assume that 


Input : A Kappa system 1Z — (xo, O, {n,..., r n }). 
Output: A Kappa system TZ' = ( xq , O , {rq, ..., r^}). 

1 n <— me ( n ), n <— src(^) 

2 TZ i — Fast dimerization reduction (TZ) 

3 K<— ME (TZ), 1Z <— SRC(TZ) 

4 1Z i — Generalized enzymatic catalysis reduction ( TZ ) 

5 TZi — ME (TZ), TZ <— SRC(^) 

Algorithm 1: Approximation algorithm. ‘ME’ is a short¬ 
hand for ‘modifier elimination’ and ‘SRC’ is shorthand for 
similar rule composition. The exact reductions are per¬ 
formed before and after each of the two other reductions. 

the reaction is equilibrated, that is, kxM(t) 2 —k~XM 2 (t) = 0, 
where XM{t) and XM 2 (t) denote the copy number at time £, 
of monomers and dimers respectively. Such assumption al¬ 
lows us to eliminate the dimerization reactions, and only 
the total amount of molecules M needs to be tracked in the 
system. The respective monomer and dimer concentrations 
can be expressed as fractions of the total concentration: 

xm (t) — ^y/8 KMp(t) + 1 — , and 

M T (t) 1 m 

XM 2 (t) = — -- x M (t ), 

where K — jX and Mp(t) — XM{t) + 2 XM 2 (f)- 

The correctness of the generalized enzymatic and dimeriza¬ 
tion reduction can be shown with the same technique as 
Theorem [l] In the context of multiscale stochastic reaction 
networks, both reaction rates should be treated as fast. 

4. REDUCTION ALGORITHM 

The idea of the reduction is to transform a Kappa system 
TZ to a Kappa system TZ' with fewer rules and fewer agents, 
while still capturing the observables and the relevant dy¬ 
namics. Our algorithm statically analyzes the rule-set, in 
search for one of the following mechanistic schemes: 

• the modifier elimination and similar rule composition , 
that are the patterns amenable to the exact reduction 
(providing the equivalent rule-based model with fewer 
rules), as well as 

• the generalized enzymatic catalysis , enzymatic cataly¬ 
sis and fast dimerization reductions, three interaction 
patterns amenable to approximate reduction based on 
time-scale separation (Section [ 3 }. 

Recall that the generic framework for time-scale separation 
in biochemical reaction networks is shown in [30]. A special 
case of this framework is Theorem [l] which confirms that 
using the classical enzymatic approximation in stochastic 
setting is adequate. After detecting one of the five interac¬ 
tion patterns, our algorithms, similarly as in '32], perform 
additional checks, in order to avoid the situations where the 
equilibrium assumptions are violated due to interleavings 
with the rest of the reaction network. 

The top-level algorithm is shown in Alg.[l] We next describe 
each of the five interaction patterns in more detail. 









4.1 Similar rule composition 

In similar rule composition scheme, rules have the same re¬ 
actants, modifiers and products, but different rates. Our 
algorithm combines them into a single rule, by summing 
their rate laws. Notice that this reduction is exact, that is, 
applying the similar rule composition does not change the 
semantics of the rule-based system. 

4.2 Modifier elimination 

This reduction can be applied when a species only appears 
as a modifier throughout a rule-based system. Such a species 
will never change its copy number throughout the dynamics, 
and therefore, its quantity will be constant. The species be¬ 
ing always a modifier does affect the dynamics of the system, 
and all the rule rates where the species was involved need 
to be adapted. Concretely - after the species is eliminated, 
each rate law will be multiplied by the initial copy number 
of this species. Notice that modifier elimination reduction 
is exact, that is, applying the modifier elimination does not 
change the semantics of the rule-based system. 

4.3 Fast dimerization reduction 

The algorithm searches for dimerisation rules. Suppose that 
a pair of reversible reactions M + M ++ M 2 is detected. Be¬ 
fore proceeding to the reduction, we check whether a dimer 
is produced elsewhere, or if the monomer is a modifier else¬ 
where. These checks are necessary because they prevent 
from deviating from the assumed equilibrium. Finally, if 
all checks passed, the dimerization reaction can be elimi¬ 
nated. A new species Mt is introduced, and, wherever the 
monomer M or dimer M 2 were involved, they are replaced 
by the species Mr, and the rate is adapted accordingly, by 
the expressions shown in Section [3.3| 

4.4 Generalised enzymatic reduction 

The algorithm searches for the scheme described in Sec¬ 
tion [ff2] by searching for candidate enzymes. Each pattern 
is tested as to whether it is catalyzing some enzymatic re¬ 
duction. If a pattern s indeed is an enzyme (operator) in 
an enzymatic reaction scheme, a set of all patterns c which 
compete to bind to s is formed, as well as the set of their 
complexes sc. Then, before proceeding with the reduction, 
additional tests must be performed: (i) pattern s must be a 
species, and it is not an observable, (ii) s must be small in 
copy number, that is, its initial copy number is smaller than 
a threshold, (iii) s can neither be produced, nor degraded, 
(iv) complex sc is not an observable and is never appearing 
in another rule of 1Z and has initially zero abundance. Then, 
the patterns s and sc can be eliminated from the rule-set and 
the reaction rates are adjusted according to the description 
in Section [ff2l 

Often times, enzymatic catalysis reduction is appropriate to 
eliminate the binding of the transcription factor to the op¬ 
erator site. In this context, the operator site takes the role 
of the enzyme, and transcription factor(s) the role of the 
substrate. Whenever a candidate enzyme is detected, and 
the other algorithm checks pass, the rates are appropriately 
scaled. The competitive enzymatic reduction is suitable in a 
situation when more transcription factors compete for bind¬ 
ing the enzyme, each in a different reaction. In other words, 
the algorithm finds k rules where k different substrates com¬ 
pete for the same enzyme. 


Example 2. We illustrate the competitive enzymatic trans¬ 
formation on a small subnetwork of the X-phage model, which 
will be introduced in Section [5| The four rules presented be¬ 
low model the binding of the agent RNAP to the operator site 
of the agent PRE and subsequent production of protein Cl. 
Agent PRE binds either only RNAP (at rate ki+ and or 

simultaneously with CII (at rate k 2 + and k 2 ~). The pro¬ 
tein can be produced whenever PRE and PRE are bound, but 
the rates will be different depending on whether only RNAP is 
bound to the operator (rate k b ), or, in addition, CII is bound 
to the operator (rate k ay ): 

PftE(cii,rnap), i?M4P(pl,p2) 

P/2E(cii,rnap!l), /2i\MP(pl!l,p2)@ki+, ki_ 

PPii(cii,rnap), CTl(pre), PMP(pl,p2) 
o PRE(cii!l,rnap!2), C , Il(pre!l),PW(pl!2,p2)@k a+ ,k a _ 

PPii(cii,rnap!l), PMP(pl!l,p2) 

-A PPii(cii,rnap!l), PMlP(pl!l,p2), 10CT(ci,or)@k b 

PRE( cii! 1,rnap!2), CTJ(pre!l), PW(pl!2,p2) 

-A PRE(cii!l,rnap!2), CTJ(pre!l),PHMP(pl!2,p2), 10CT(ci,or)@k a 

After the competitive enzymatic reduction, the operator PRE 
is eliminated from each of the two competing enzymatic catal¬ 
ysis patterns. Finally, the production of Cl is modelled only 
as a function of RNAP and CII, and the rate is appropriately 
modified: 

RNAP{ pl,p2), CTl(pre) 

-A RNAP( pl,p2), CTl(pre), 10CT(pr,ci)@ k new . 

5. CASE STUDY: a-PHAGE 

The phage A is a virus that infects E.coli cells, and replicates 
using one of the two strategies: lysis or lysogeny. In the ly¬ 
sis strategy, phage A uses the machinery of the E.coli cell 
to replicate itself and then lyses the cell wall, killing the cell 
and allowing the newly formed viruses to escape and infect 
other cells, while in the lysogeny scenario, it inserts its DNA 
into the host cell’s DNA and replicates through normal cell 
division, remaining in a latent state in the host cell (it can 
always revert to the lysis strategy). The decision between 
lysis and lysogeny is known to be influenced by environmen¬ 
tal parameters, as well as the multiplicity of infection and 
variations in the average phage input |1 . 

The key element controlling the decision process is the Or 
operator (shown in Fig. [ 5 ]), which is composed of three op¬ 
erator sites {Ori , Or 2 , OR 3 ) to which transcription factors 
can bind, in order to activate or repress the two promoters 
(. Prm and Pr) overlapping the operator sites. When RNAP 
(RNA polymerase, an enzyme that produces primary tran¬ 
script RNA) binds to Prm , it initiates transcription to the 
left, to produce mRNA transcripts from the cl gene; RNAP 
bound to the Pr promoter, on the other hand, initiates tran¬ 
scription to the right, producing transcripts from the cro 
gene. The two promoters form a genetic switch, since tran- 


Original vs reduction, 10 promoters, 300 RNAP 



Figure 3: Average trace of 10 simulations of the orig¬ 
inal model (red) and the reduced model (green) after 
the reduction, for initially 10 A phage cells (multi¬ 
plicities of infection — MOI’s). The simulation time 
for one simulation trace of the original model is ~ 40 
minutes of CPU time, and of the reduced model is 5 
seconds of CPU time. The initial number of proteins 
Cl, Cro, CII and CIII and N is set to 100. 


scripts can typically only be produced in one direction at a 
time. 

The cl gene codes for the Cl protein, also known as the A 
repressor : in its dimer form (two Cl monomers react to form 
a dimer, C/ 2 ), it is attracted the the Or operator sites in the 
phage’s DNA, repressing the Pr promoter from which Cro 
production is initiated and further activating Cl production. 
Similarly, the cro gene codes for the Cro protein, which also 
dimerizes in order to bind to Or operator sites and prevent 
production from Prm , or even its own production. 

While CI 2 and CV 02 can bind to any of the three operator 
sites at any time, they have a different affinity to each site. 
The CI 2 has its strongest affinity to the Ori operator site, 
next to the Or 2 site, and finally to the Or 3 site (in other 
words, CI 2 first turns off Pr , then activates Prm , and 
finally, represses its own production), while CV 02 has the 
reverse affinity (it first turns off Cl production, then turns 
off its own production). 

The feedback through the binding of the products as tran¬ 
scription factors coupled with the affinities described makes 
the Or operator behave as a genetic bistable switch. In 
one state, Cro is produced locking out production of CL 
In this state, the cell follows the lysis pathway since genes 
downstream of Cro produce the proteins necessary to con¬ 
struct new viruses and lyse the cell. In the other state, Cl is 
produced locking out production of Cro. In this state, the 
cell follows the lysogeny pathway since proteins necessary to 
produce new viruses are not produced. Instead, proteins to 
insert the DNA of the phage into the host cell are produced. 

What’s more, in the lysogeny state, the cell develops an im- 



Multiplicity of infection (MOI) 


Figure 4: Comparison of the probability of 

lysogeny before and after the reduction of the 
model (lysogeny profile is detected if there are 328 
molecules of Cl before there are 133 molecules of 
Cro). The profile was obtained by running 1000 sim¬ 
ulations of the model for one cell cycle (2100 time 
units), for MOIs ranging from 1 to 10. Simulation 
times are as reported in Fig. [3j 


Cl CI 2 



Figure 5: Cl monomers are produced from the cl 
gene; two monomers can form a dimer, that can 
bind to one of the Or operator sites (the Figure is 
taken from [36]). 

munity to further infection: the cro genes found on the DNA 
inserted by further infections of the virus are also shut off by 
CI 2 molecules that are produced by the first virus to com¬ 
mit to lysogeny. Once a cell commits to lysogeny, it becomes 
very stable and does not easily change over to the lysis path¬ 
way. An induction event is necessary to cause the transition 
from lysogeny to lysis. For example, lysogens (i.e., cells with 
phage DNA integrated within their own DNA) that are ex¬ 
posed to UV light end up following the lysis pathway. 

5.1 Results and discussion 

We applied our reduction algorithm to a Kappa model of 
the phage A decision circuit that we built using the reaction- 
based model presented in [36], [32] . 

5.1.1 Intermediate tests 

Intermediate tests were carried out on the portion of the 
circuit that is involved in Cl production from the he Pre 










Simulation events (EGFR signaling) 


promoter and in CII production from the Pr promoter. Ini¬ 
tially, this model contained 10 rules, 5 proteins and 4 species; 
after applying the reduction algorithm, it was reduced to 4 
rules and 3 proteins. 

5.1.2 Full model 

Simulations were carried out on the complete chemical reac¬ 
tion genetic circuit model which contains 96 rules, 16 pro¬ 
teins and 61 species (the contact map is shown in Fig. [7|. 
After applying the reduction, the Kappa model is reduced 
to 11 rules and 5 proteins. 

In Fig. [3] we plot the mean for the Cl copy number obtained 
from 10 runs of the original and of the reduced model, and 
the graphs show agreement. 

In Fig. [4] we compared the probability of lysogeny before 
and after the reduction of the model (lysogeny profile is de¬ 
tected if there are 328 molecules of Cl before there are 133 
molecules of Cro). The graphs show overall agreement in 
predicting the lysogeny profile. More precisely, for two and 
less MOI’s (multiplicities of infection), the probability of 
lysogeny is almost negligible; For three MOIs, both graphs 
show that lysogeny and lysis are equally probable (the re¬ 
duced model reports slightly larger probability), and for five 
or more MOI’s, both graphs show that lysogeny is highly 
probable. While one simulation of the original model takes 
about 40 mins, one simulation of the abstracted model takes 
about 5 seconds. Once again, the results are similar, with a 
significant improvement in simulation speed. 

The tool is available for download 3 . 

6. CONCLUSION AND FUTURE WORK 

In this paper, we presented a method for automated reduc¬ 
tion of rule-based models. The reduction is based on equi¬ 
librium approximations: certain rules and species are elim¬ 
inated and the rates are approximated accordingly. More 
concretely, a number of reaction patterns known to be amenable 
to equilibrium approximations are recognised by static in¬ 
spection of the rules. The crucial aspect of the presented 
approach is that each approximation step can be retrieved 
at any time, and no information about the original, detailed 
model is lost. The presented method can be seen as the first 
step towards a systematic time-scale separation of stochastic 
rule-based models. The guarantees of the presented reduc¬ 
tion method are given for the asymptotic behaviour. Bhat- 
tacharyya distance is proposed as a metric to quantify the 
reduction error with respect to the observable. We plan to 
further investigate how to practically access the approxima¬ 
tion error. To this end, the error can be measured with 
respect to a given observable, or, more generally, with re¬ 
spect to a given property specified in, for example, linear 
temporal logic (LTL). 

We implemented the tool and evaluated it on a case study 
of a lambda phage bistable switch. The simulation of one 
cell cycle was improved from 40mm CPU time to 5sec, and 
the profiles of the observables show agreement. We plan 
to extend the set of approximation patterns so to obtain 
good reductions for complex models of signaling pathways. 
More precisely, while our tool is applicable to any rule-based 
model, the chosen set of approximation patterns are tailored 




Figure 6: a) The ratio of dimerisation events 

vs. total events in lambda phage model. The 
number of dimerisation events takes roughly half 
of the total events over the whole cell cycle, b) 
The ratio of dimerisation events vs. total events in 
EGFR/insulin model. The number of dimerisation 
events takes only a small fraction of the total events 
over the whole cell cycle. 



Figure 7: The contact map of the full A-phage model. 
The model consists of 96 rules, 16 proteins and 61 
species. The reduced model has 11 rules and 5 pro¬ 
teins. 


for GRNs and may not provide significant reductions when 
applied to the signaling pathways. To illustrate this, we 
applied the reduction to the EGF/insuling crosstalk model, 
and we observe that the number of dimerisation events does 
not take the significant portion of all events (see Fig. [6]), at 
least not as radically as it was the case with the lambda 
phage example. To this end, we plan to include more pat¬ 
terns for reducing signaling pathways, by, for example, ap¬ 
proximating multiple phosphorylation events. 
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